Databricks Templatized Transformation Jobs

Data transformation is the process of converting, cleansing, and structuring data into a usable format that can be analyzed to support decision making processes. The data transformation process converts raw data into a usable format by removing duplicates, converting data types, and enriching the dataset. The process involves defining the structure, mapping the data, extracting the data from the source system.

Lazsa Data Pipeline Studio (DPS) provides templates for creating transformation jobs. The jobs include join/union/aggregate functions that can be performed to group or combine data for analysis.

For complex operations to be performed on data, Lazsa DPS provides the option of creating custom transformation jobs. For custom queries while the logic is written by the users, DPS UI provides an option to create SQL queries by selecting specific columns of tables. Lazsa consumes the SQL queries along with the transformation logic, to generate the code for custom transformation jobs.

To create a Databricks templatized transformation job

  1. Sign in to the Lazsa Platform and navigate to Products.

  2. Select a product and feature. Click the Develop stage of the feature and navigate to Data Pipeline Studio.

  3. Create a pipeline with the following nodes:

    Note: The stages and technologies used in this pipeline are merely for the sake of an example.

    • Data Source - REST API

    • Data Integration - Databricks

    • Data Lake - Amazon S3

    • Data Transformation - Databricks

    • In the data transformation stage click the Databricks node, and select CreateTemplatized Job - to create a transformation job using the out-of-the-box template provided by the Lazsa Platform.

      Databricks Template Transformation job

To create a templatized Databricks transformation job, complete the following steps: